Induction of Classifications from Linguistic Data
نویسندگان
چکیده
We present a flexible approach for extracting hierarchical classifications from linguistic data. To this end, the framework of observational logic is introduced, which extends the logic that underlies standard Formal Concept Analysis by allowing disjunctive rules and exclusions. We give a rigorous mathematical characterization of how the chosen rule type affects the structure of the induced hierarchy. The framework is applied to the induction of hierarchical classifications from linguistic databases. The pros and cons of several types of hierarchies are discussed in detail with respect to criteria such as compactness of representation, suitability for inference tasks, and intelligibility for the human user. 1 THE LOGIC OF LINGUISTIC CLASSIFICATION A simple method for classifying (linguistic) data is provided by taxonomic trees, which are ubiquitous in linguistic textbooks. For example, nominal words are traditionally subdivided into pronouns, nouns, adjectives, etc; pronouns are further subdivided into interrogative pronouns, personal pronouns, etc, etc. From a logical point of view each concept of a taxonomic tree implies its superordinate concept; e.g. pronoun implies nominal word . Furthermore, any two subconcepts of the same concept are incompatible, as e.g. noun and adjective. In addition, classification by taxonomic trees is often assumed to be exhaustive in the sense that every concept implies the disjunction of its immediate subconcepts. Systemic networks, which have their roots in systemic grammar (e.g. [10]), provide a more sophisticated formalism for presenting linguistic classification. Figure 1 shows a small fragment of such a network. The classifiers aligned to the right of a bar constitute a
منابع مشابه
Making Explicit the Hidden Semantics of Hierarchical Classifications
Hierarchical classifications are concept hierarchies used to organize large amounts of documents. File systems, products’ taxonomies for the market place and the directories provided by Web portals are common examples of hierarchical classifications. As semi-structured knowledge sources, hierarchical classifications have peculiar features: they differ both from plain texts since they are based ...
متن کاملEvaluation of the nutritional effects of fasting on cardiovascular diseases, using fuzzy data mining
Background: Advances in information technology and data collection methods have enabled high-speed collection and storage of huge amounts of data. Data mining can be used to derive laws from large data volumes and their characteristics. Similarly, fuzzy logic by facilitating the understanding of events is considered a suitable complement to scientific data mining. Materials and Methods: The pre...
متن کاملNeuropsychological Double Dissociation between Linguistic Levels: Clinical Linguistic Evidence from Iranian Aphasic Patients
Introduction: In this paper we report on clinical linguistic applications of several versions of the Bilingual Aphasia Test (BAT) and the Persian Aphasia Battery (PAB) developed to assess patterns of recovery and language impairments in monolingual and bilingual aphasics with different clinical histories living in Iran. Methods: The participants are adult monolingual native speakers of Persian ...
متن کاملData Mining as a Method for Linguistic Analysis: Dutch Diminutives*
We propose to use data mining techniques (inductive techniques for the automatic acquisition of comprehensible knowledge from data) as a method in linguistic analysis. In the past, such techniques have mainly been used in linguistic engineering applications to solve knowledge acquisition bottlenecks. In this paper we show that they can also assist in linguistic theory formation by providing a n...
متن کاملA Bayesian Approach to Genome/Linguistic Relationships in Native South Americans
The relationship between the evolution of genes and languages has been studied for over three decades. These studies rely on the assumption that languages, as many other cultural traits, evolve in a gene-like manner, accumulating heritable diversity through time and being subjected to evolutionary mechanisms of change. In the present work we used genetic data to evaluate South American linguist...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002